RM R1 Qwen2.5 Instruct 32B
MIT
RM-R1 is a framework for reward modeling through reasoning trajectory generation, offering significant improvements in accuracy and interpretability compared to traditional methods
Large Language Model
Transformers English